Application of compressed sensing to the simulation of atomic systems 
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Compressed sensing is a metliod that allows a significant reduction in the number of samples 
required for accurate measurements in many applications in experimental sciences and engineering. 
In this work, we show that compressed sensing can also be used to speed up numerical simulations. 
We apply compressed sensing to extract information from the real-time simulation of atomic and 
molecular systems, including electronic and nuclear dynamics. We find that for the calculation of 
vibrational and optical spectra the total propagation time, and hence the computational cost, can 
be reduced by approximately a factor of five. 



INTRODUCTION 

A recent development in the field of data analysis is 
the compressed sensing (CS) (or compressive sampling) 
method [H, Q . The foundation of the method is the con- 
cept of sparsity: a signal expanded in a certain basis 
is said to be sparse when most of the expansion coefB- 
cients are zero. This extra information can be used by 
the CS method to significantly reduce the number of mea- 
surements needed to reconstruct a signal. CS has been 
successfully applied to data acquisition in many differ- 
ent areas [3|. For example, to improve the resolution of 
medical magnetic- resonance imaging Q . It has also been 
applied to the experimental study of atomic and quan- 
tum systems [5|-|7|. 

In this article we show that CS can also be an invalu- 
able tool for some numerical simulations where the opti- 
mal sampling of CS is reflected in a considerable reduc- 
tion of the computational cost. We focus on atomistic 
simulations of nanoscopic systems by using CS to extract 
frequency-resolved information from real-time methods 
such as molecular dynamics (MD) and real-time electron 
dynamics. 

In MD[^, Q the trajectory of the atomic nuclei is ob- 
tained by integrating their equations of motion with an 
interaction obtained from parametrized force-fields or by 
explicitly modeling the electrons [l^l- Many static and 
dynamical properties can be obtained from MD, making 
it one of the most widely used methods to study atom- 
istic systems computationally. As such it is important 
to develop methods that can improve the precision and 
reduce the computational cost of this method, especially 
for ab-initio MD. 

While not as widely used as MD, real-time electron 
dynamics, in particular real-time time-dependent den- 
sity functional theory (TDDFT) is an important 
approach to study linear and non-linear electronic re- 
sponse 12l4l5|. Due to the scalability and parallelizabil- 
ity properties real-time TDDFT is particularly efficient 
for large electronic systems so an additional reduc- 
tion in the computational cost can push the boundaries 
of the system-sizes that can be studied. 



Many physical properties are represented by frequency- 
dependent quantities. To obtain these from a real-time 
framework usually a discrete Fourier transform (FT) is 
used. Our approach is to replace this FT by a calcula- 
tion of the Fourier coefficients based on the CS method. 
To obtain a given frequency resolution, the CS method 
requires a total propagation time that is several times 
smaller than that required by a FT. 

CS has the potential to provide across-the-board 
speedup for many applications involving the computa- 
tion of sparse spectra. Moreover, this speedup may be 
obtained without making any changes to the underly- 
ing propagation code used in different types of electronic 
and nuclear calculations; one simply replaces the FT al- 
gorithm with the CS method, making the approach quite 
straightforward to implement. This paper introduces 
CS and demonstrates its broad utility in computational 
chemistry and physics by applying it to the calculation of 
various nuclear and electronic spectra of small molecules. 
The resulting computer code is available as open-source 
software. 

The article is structured as follows. We first introduce 
the CS method and show how it may be applied to the 
determination of Fourier coefficients. Next, we apply CS 
to the calculation of vibrational, optical absorption, and 
circular dichroism spectra. We then proceed to a discus- 
sion of the numerical methods used in our CS implemen- 
tation. Finally, we offer conclusions and an outlook. 



COMPRESSED SENSING 

In this section, we briefly introduce the application of 
the CS method to the calculation of Fourier coefficients. 
More details about the method and its origins may be 
found in Refs. (TMl- 

For simplicity, we assume that we want to calculate 
a certain frequency-resolved quantity g(u>) that is given 
by the sine transform of a certain time-resolved function 
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(the analysis is equally valid for the cosine transform). 
Since we are interested in numerical solutions, we need 
to think in terms of discrete quantities. In this case 
we want to obtain a series of values {<?i,g2j ■ ■ ■ ,gN^} at 
Ni^ equidistant frequencies ujj = Awj, from the known 
set of time-resolved values {hi, h2, ■ ■ ■ , Iin^} given at Nt 
equidistant times tj = Atj. 

In principle, the gk set can be directly obtained using 
the discrete FT, 
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However, if we expect that many of the Fourier coeffi- 
cients are zero, a property known as sparsity, we can use 
this additional information to obtain more precise results. 
This is the basis for the CS scheme. 

We start by reformulating the discrete Fourier trans- 
form in eq. ([5]) as a matrix inversion problem. From this 
perspective, we are trying to solve the linear equation for 
9, 



Fg^h, 



where F is the Ni^ x Nt Fourier matrix with entries 
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Our objective is to obtain sensible results with Nt as 
small as possible. Thus, we are interested in the case 
Nuj > Nt, where the linear system is under-determined, 
and there are many solutions for g (in fact, one of them 
will be given by eq. From all the solutions of cq. 
we select the one that has the largest number of zero co- 
efficients: the sparsest solution. This turns out to be 
equivalent to the so-called basis-pursuit (BP) optimiza- 
tion problem [l8| 
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subject to Fg = h , 



(5) 



which is what one solves in practice (where \g\i = \gk\ 
is the standard 1-norm). 

The CS scheme can be generalized to allow for a certain 
amount of noise in the time-resolved signal. In this case 
the problem to be solved is known as basis-pursuit de- 
noising (BPDN) 
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subject to \Fg — h\ <ri , 



(6) 



VIBRATIONAL SPECTRA 

MD can be used to obtain information about the vi- 
brational modes of atomic systems. Experimentally, the 
quantities that usually give access to the vibrational 
modes are the infrared and Raman spectra that can be 
obtained from MD as the Fourier components of the elec- 
tronic polarization and polarizability, respectively. If we 
are only interested in the vibrational frequencies, from 
the nuclear velocities, {fj}, we can calculate the velocity 
autocorrelation function 
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whose cosine transform is the vibrational frequency dis- 
tribution (ii 



J{lo) = / At-i{t)cos{ujt) 



(8) 



Since this spectrum is composed of a finite number of 
frequencies (less than three times the number of atoms 
in the system) , the calculation is ideal for the CS method. 

To illustrate the properties of the CS method we start 
with a simple case, the single vibrational frequency of a 
diatomic molecule, Na2, that we simulate using ab-initio 
molecular dynamics. In Fig. [1] we show how the vibra- 
tional spectrum depends on the amount of time for which 
the velocity autocorrelation is calculated. While the dis- 
crete FT requires long times to resolve the vibrational 
frequency, the CS method gives a well-defined peak even 
with less than one oscillation of the molecular vibrational 
mode. That the peak is well defined, however, does not 
imply that peak position is converged. As it can be seen 
in Fig. [21 the peak position oscillates with the total time 
until it converges to the proper value after a few periods 
are sampled. Still, the result converges much faster than 
compared with the width of the peak given by a FT. We 
remark that the CS process does not use any additional 
information about the the signal beyond assuming it is 
sparse. 

To further demonstrate the advantages of this ap- 
proach, we now calculate the vibrational spectrum for a 
benzene molecule from a ab-initio MD simulation. Fig. |31 
We can see that the CS approach with 1000 fs obtains a 
spectrum that is better resolved than the FT results for 
5000 fs. This is directly translated into a reduction of 
the computational time by five times or more. It is rea- 
sonable to expect that equivalent gains can be obtained 
for the computer simulation of other vibrational spectro- 
scopies like infrared and Raman. 



where rj represents the level of noise in the signal. This 
is the formulation we use in our case, since we expect a 
certain amount of noise coming from the finite-precision 
numerical calculations (and possibly other sources). 



OPTICAL ABSORPTION SPECTRA 

Optical absorption is an electronic process. While it 
can be calculated from a linear response framework (2ll . 
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FIG. 1: Frequency distribution spectrum of Na2 
calculated using a Fourier transform and compressed 
sensing for different total propagation times: a) 100 fs, 

b) 205.65 fs (pa 1 oscillation period), and c) 4000 fs. 
The left plots show the velocity autocorrelation function 
and the right plots show the frequency spectrum. 
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FIG. 2: Error in the vibrational frequency of Na2 
computed by compressed sensing with respect to total 
time. For comparison we plot the width of the peak 
obtained by using a discrete Fourier transform. The 
width, cr, is calculated by assuming the peak has a 
Gaussian form Aexp[— aj^/(2(T^)]. 
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FIG. 3: Frequency distribution spectrum of benzene. 
Comparison of a compressed sensing calculation with 
1000 fs and a Fourier transform with 5000 fs. 



22|, it can also be obtained from real-time electron dy- 
namics 12[. To obtain the spectrum from real-time dy- 



namics the electronic system is propagated under the 
effect on an electric field of the form E{r,t) = K6{t). 
From the propagation the time-dependent dipole moment 
p{t) is obtained, and from the dipole, the frequency- 
dependent polarizability can be obtained as (atomic units 
are used in the next two sections) 
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In order to obtain the full a tensor three propagations 
are required (with k in different directions. 

The absorption cross-section is related to the trace of 
the imaginary part of the polarizability tensor 



OC 



(10) 



The optical absorption spectra is an ideal candidate for 
the application of GS. For a molecule, the electronic tran- 
sitions between bound states produce a discrete spectrum 
in the low energy region. At higher energies, the transi- 
tions to unbound states produce a continuous spectrum. 
Standard calculation approaches, however, cannot cap- 
ture this continuous spectrum and approximate it as a 
sequence of discrete excitations. 

In Fig. m we show the optical absorption spectrum for 
benzene calculated via real-time TDDFT. There we illus- 
trate the effect of the propagation time on the spectrum 
for CS and FT. From the figure, it is clear that the GS 
method is capable of resolving the spectrum much better 
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FIG. 4: Optical absorption spectra for benzene 
computed from real-time TDDFT with different 
propagation times. Comparison between compressed 
sensing and discrete Fourier transform. Experimental 
results from Ref. 1231. 



and with a shorter propagation time than a discrete FT. 
For a given resolution, the FT requires approximately 5 
times the propagation time as CS (as can be seen, for 
example, by comparing the FT at 25 fs with CS at 5 fs). 



CIRCULAR DICHROISM SPECTRA 

Another property that can be calculated from real- 
time electron dynamics is circular dichroism (CD) spec- 
tra 13, 2^. A CD spectrum measures the difference in 
a chiral molecule's response to left and right circularly- 
polarized light. The real-time calculation is performed in 



the same manner as the optical absorption case, but the 
key quantity to be calculated is now the time-dependent 
orbital magnetization m{t). The frequency-dependent 
electric-magnetic cross-response tensor may be obtained 
as 



/3y(cj) = / dte-"^* [m^it) - mj(0)] . (11) 



Jo 



The rotatory strength, which is the quantity typically 
plotted in CD spectra, is related to the trace of the imag- 
inary part of /3(cj) according to 



R{uj) = — amV /3,i(a 

TTC ^ — ' 



(12) 



The rotatory strength i?(aj) is suitable to the CS scheme 
because it is sparse in frequency space. In fact, the peaks 
in a CD spectrum are located at the same positions as 
in an absorption spectrum. However, the CD spectrum 
contains both positive and negative peaks. 

Fig. [S] compares the CD spectrum for (i?)- 
mcthyloxirane as computed by FT and CS for two dif- 
ferent propagation times (10 fs and 50 fs). As can be 
seen from the figure, for a given propagation time, the 
CS method provides better spectral resolution than the 
discrete FT. In fact, just as with linear absorption, FT re- 
quires a propagation time approximately 5 times as long 
as CS to obtain a comparable spectral resolution (as can 
be seen by comparing the 50 fs FT with the 10 fs CS). 

Fig. [S]also illustrates another feature of CS: unlike a 
direct FT, the CS method is non-linear. Adding together 
time-resolved signals, then applying CS, generally gives 
different results from applying CS first and then adding 
together the results in the frequency domain; this is par- 
ticularly the case if not all the peaks are well-resolved. In 
other words, the use of CS to convert time-resolved data 
into the frequency-domain, as in eq. pT|) . and the calcu- 
lation of the trace, as in eq. ([T2|) . do not commute. Hence, 
there are two approaches to obtain the CD spectrum: we 
can perform CS for each propagation direction and then 
compute the trace, or we can compute the trace in the 
time-domain and then perform CS. Both approaches are 
shown in Fig. [SJ at 10 fs, they give similar but not iden- 
tical results. This is to be expected: the ability of CS to 
resolve peaks depends on the sparsity of the spectrum. 
Since each propagation direction is sparser than the sum 
of all three, CS resolves more peaks at 10 fs when it is per- 
formed prior to the trace. For longer propagation times 
(50 fs), all of the peaks are more fully resolved and the 
two approaches converge. In any event, both approaches 
to CS provide much improved resolution over a direct FT 
for a given propagation time. 
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FIG. 5: Circular dichroism spectrum computed for 

(iZ)-methyloxirane from real-time TDDFT with 
different propagation times. Comparison between 
discrete Fourier transform and compressed sensing 
(CS). Since the CS process is non-linear we compute the 
spectra in two ways. CS before trace: the spectra is 
calculated for each direction using CS and then the 
trace, eq. ((T^ . is computed. CS after trace: the trace is 
calculated in the time domain and then the CS process 
is used over this averaged signal. 



NUMERICAL METHODS 

Numerically, to find a spectrum using the CS method 
we need to solve eq. This is not a trivial prob- 

lem, so we rely on the SPGLl algorithm developed by 
van den Berg and Friedlander (26|. To avoid numerical 



stability issues we work with a normalized BPDN prob- 
lem, where the factors of the F matrix, eq. (|4]), are left 
out and h is normalized. The missing factors are included 
in g after the solution is found. This has the additional 
advantage of making the noise parameter rj of eq. ([5]) 
dimensionless. 

Since we do not a have an a priori estimate for rj, we 
do not set it directly. As the SPGLl algorithm finds 
a sequence of approximated solutions with a decreasing 
value of 77, we set the target value to zero. We assume 
that the calculation is converged when the value of rj falls 
below a certain threshold (10~^) or the active space of the 
system, the set of non-zero coefficients, has not changed 
for a certain number of iterations (50). In the former case 
we consider that a solution of the BP problem, eq. ([5|), 
has been found. For all the calculations presented here 
?; < 10-3. 

CS is much more costly numerically than the discrete 
FT approach, as it usually involves several hundreds of 
matrix multiplications. However, this is not a problem 
for application since usually the solution process nor- 
mally only takes a few minutes, much less than the com- 
putation time required to simulate the real-time dynam- 
ics of large atomic systems. 

All the calculations presented in this article were per- 
formed using the OCTOPUS code [H, 1131 at the (time- 
dependent) density functional theory level with the PBE 
exchange correlation functional |28| . The adiabatic 
molecular dynamics calculations were performed from 
first principles using the modified Ehrenfest method 0, 
[30I I with a fi factor of 30 for Na2 and 5 for benzene. The 
systems were given initial velocities equivalent to 300 K 
and the MD is performed at constant energy. 

All calculations used norm-conserving pseudo- 
potentials with a real-space grid discretization. The 
shape of the grid is a union of boxes around each atom. 
For Na2 we use a spacing of 0.375 a.u. with a sphere 
radius of 12 a.u., and the MD time-step is 0.057 fs. 
For benzene, the grid-spacing is 0.35 a.u., the radius 
is 14 a.u., and the time-step is 0.0085 fs for MD and 
0.0017 fs for real-time TDDFT. For (iZ)-methyloxirane, 
the spacing is 0.378 a.u., the sphere radius is 15.1 a.u., 
and the time-step is 0.0008 fs for real-time TDDFT. For 
the vibrational spectrum calculation we use a time-step 
10 times the one of the MD, the energy step is 0.01 
1/cm, and the maximum spectrum energy is 5000 1/cm. 
For the benzene optical absorption spectra, we use a 
time-step of 0.0017 fs, the energy step is 0.027 eV, 
and the maximum spectrum energy is 820 eV. For 
the (i?)-methyloxirane circular dichroism spectra, the 
time-step is 0.0008 fs, the energy step is 0.01 eV, and 
the maximum spectrum energy is 330 eV. The structure 
of benzene was taken from ref. 3l| and the structure of 
(i?)-methyloxirane was taken from ref. (32| . 

All discrete FTs were performed using third-order 
polynomial damping: each signal at time t was multi- 
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plied by p{t) = 1 - 3{t/T)'^ + 2{t/Tf prior to Fourier 
transform, where T is the time-length of the signal. 

The SPGLl method used for CS was implemented into 
OCTOPUS based on a Fortran translation of the original 
Matlab code of van den Berg and Friedlander [s^. We 
plan to release this implementation as a standalone tool 
in the near future (for the moment the code can be ob- 
tained from the OCTOPUS repository). 



CONCLUSIONS 

We have shown that the CS method can be applied to 
the numerical calculation of different kinds of atomic and 
electronic spectra. This results in a significant reduction 
of the computational time required for the numerical sim- 
ulations. The effect of this reduction is to increasing the 
size of the systems that are currently accessible to numer- 
ical simulations, and to make possible simulations with 
more precise, but more costly, methods. It also means 
that other types of simulations become more affordable 
from a real-time perspective, for example the combined 
dynamics of nuclei and electrons that are constrained to 
short simulation times by the fast dynamics of the elec- 
tron. 

In this work, we have shown the application of CS 
to the calculation of a few types of spectra, but the 
method most likely can be applied to other quantities 
as well, such as non-linear optical response Ij, [l^ , mag- 



netic circular dichroism 34 



semi-classical nuclear dy- 
Of course. 



etc. 



namics [35|, |36[ , 2D spectroscopy [37|, 
the method is not limited to atomistic simulations and 
could be applied to simulations in all scientific fields. 

The main limitation of the CS approach is that it will 
not be beneficial for quantities that are not sparse. In 
such case, the performance of CS will be equivalent to 
a standard discrete FT. There are some cases where the 
sparsity requirement might be circumvented. For exam- 
ple, though the real part of the polarizability tensor is not 
sparse, it could be computed from the imaginary part by 
using the Kramers-Kronig relation. Another example is 
crystalline systems (soj . where there is a continuum of 
excitation energies. In this case it might be possible to 
apply the CS scheme to each fc-point separately. 

We expect that compressed sensing will become widely 
used in the scientific computing community once its ad- 
vantageous properties become more widely known. The 
main difficulty in the adoption of CS is that it is more 
complex to implement than a discrete FT. This problem 
can be solved by providing libraries and utilities that 
can be used by researchers as a black-box. Other issue 
of the CS approach it has some aspects that might re- 
sult counter-intuitive at first. For example, non-linearity 
and the fact that with CS the peak width is not always 
related to the convergence of the spectrum. 

Moreover, we believe that our direct application of the 



compressed sensing methodology to numerical simulation 
opens the path for more challenging applications. An 
idea that we could call "compressed computing", where 
the principles of sparsity could be used to design algo- 
rithms for numerical simulations that have a reduced 
computational cost of calculations not only in the num- 
ber of operations, but also in memory and data transfer 
bandwidth requirements. 
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